Pricing

Pay per event

Clemson HGIC Home & Garden Factsheet Scraper

Scrapes the Clemson HGIC factsheet library — 2,500+ science-based factsheets on plant care, diseases, pest management, lawn care, and food preservation. Outputs structured records: HGIC ID, body sections, symptoms, causal agent, management, products, authors.

Pricing

Pay per event

Rating

0.0

(0)

Developer

BowTiedRaccoon

Actor stats

Bookmarked

Total users

Monthly active users

5 days ago

Last modified

What It Does

Clemson HGIC is one of the largest university extension factsheet libraries in the US (SE US plant palette, 2,500+ documents). Each factsheet follows a consistent template with discrete sections: symptoms, causal agent (pathogen/pest binomial), management/control recommendations, and prevention. This actor parses that structure into machine-readable fields — exactly what plant-diagnosis apps, AI garden assistants, and agronomy SaaS platforms need as grounding data.

The actor reads the Yoast sitemap index to enumerate all factsheet URLs, then crawls each page with impit Chrome TLS fingerprinting — no proxy or CAPTCHA solver required.

Use Cases

Training data for plant disease diagnosis AI and AI garden assistant models
Structured extension knowledge base for horticulture SaaS
Agronomy/landscaping content and reference data pipelines
Garden app content enrichment (symptom/treatment lookup)

Input

Field	Type	Default	Description
maxItems	integer	10	Maximum number of factsheets to scrape. Set to a large number to scrape all ~2,500+ factsheets.

Output

Each item represents one HGIC factsheet.

Field	Type	Description
factsheet_id	string	HGIC factsheet number, e.g. `HGIC 1223`
slug	string	URL slug, e.g. `turfgrasses-for-the-carolinas`
title	string	Factsheet title
category	string	Subject category: Diseases, Insects, Lawns, Soils, Vegetables, Trees & Shrubs, Flowers, Fruits & Nuts, Food Safety & Preservation, Human Health & Safety, General
plant_subjects	string	Comma-separated plant names from the title
problem_type	string	Problem type: `disease`, `insect`, `cultural`, or `none`
summary	string	First meaningful paragraph / introductory text
body_sections	string	JSON array of `{heading, text}` objects for the full structured body
symptoms	string	Symptom description text (for disease/pest/damage factsheets)
causal_agent	string	Pathogen or pest scientific/common name
management	string	Management and control recommendation text
prevention	string	Prevention and cultural practices text
recommended_products	string	Comma-separated trade names and chemistries found in management sections
related_factsheets	string	Comma-separated related factsheet links (`title
last_updated	string	Revision date as shown in factsheet metadata, e.g. `Feb 28, 2016`
authors	string	Comma-separated list of factsheet authors
images	string	Comma-separated image URLs embedded in the factsheet
factsheet_url	string	Canonical URL of the factsheet
scrapedAt	string	ISO-8601 timestamp when the record was scraped

Sample Output

{
  "factsheet_id": "HGIC 1223",
  "slug": "turfgrasses-for-the-carolinas",
  "title": "Turfgrasses for the Carolinas",
  "category": "Lawns",
  "problem_type": "none",
  "summary": "For over 50 years the lawn has been an integral part of the landscape...",
  "body_sections": "[{\"heading\":\"Mowing\",\"text\":\"...\"}]",
  "last_updated": "Feb 28, 2016",
  "authors": "Millie Davenport, Gary Forrester",
  "factsheet_url": "https://hgic.clemson.edu/factsheet/turfgrasses-for-the-carolinas/"
}

Discovery Method

Reads the Yoast sitemap index at https://hgic.clemson.edu/sitemap.xml, filters for factsheet-sitemap.xml and factsheet-sitemap2.xml, and collects all /factsheet/<slug>/ URLs. The maxItems cap is applied before crawling begins.

Performance

Memory: 128–256 MB
Throughput: ~200 pages/minute at default concurrency (5)
Full corpus (~2,500 factsheets): ~15–20 minutes
Timeout: 2-hour default (sufficient for full corpus)

Care Com Scraper

velvety_bedbug/care-com-scraper

Scrapes care provider listings from Care.com. Find babysitters, dog sitters, dog walkers, senior care aides, housekeepers, and tutors by location. Export to JSON or CSV.

Peters Bugs

Care Com Scraper

fortuitous_pirate/care-com-scraper

Scrapes care provider listings from Care.com. Find babysitters, dog sitters, dog walkers, senior care aides, housekeepers, and tutors by location. Export to JSON or CSV.

Fortuitous Pirate

Home Health Care Nurses Email Scraper

contacts-api/home-health-care-nurses-email-scraper

Home health care nurses email scraper to extract verified nurse emails from home healthcare agencies, nursing services, and healthcare directories 📧🏠🩺 Perfect for healthcare outreach, recruitment, and home care lead generation.

Lead Heaven

UK CQC Care Quality Scraper | Care Home Inspection Ratings

parseforge/uk-cqc-care-quality-scraper

Export UK Care Quality Commission registered locations: name, type, provider, address, postcode, regulated activities, overall rating, last inspection date and report URL. Filter by region or care type. CSV, Excel, JSON or XML for healthcare research and analysis.

ParseForge

Missouri Botanical Garden PlantFinder Scraper

jungle_synthesizer/missouribotanicalgarden-plantfinder-scraper

Scrapes the Missouri Botanical Garden PlantFinder database — the authoritative US plant care reference. Extracts 7,500+ profiles: scientific name, common name, type, family, zone, bloom time, sun, water, maintenance, and full care text (Culture, Characteristics, Problems, Uses).

BowTiedRaccoon

Care.com Scraper - Babysitters, Nannies & Caregivers

lulzasaur/care-scraper

Search Care.com for babysitters, nannies, tutors, senior care, housekeepers, and pet care providers by location. Extract ratings, reviews, experience, certifications, and bios. Supports pagination and all service types.

lulz bot

Care Services Discovery & AI Recommender

peaceful_pushpins/care-services-discovery-ai-recommender

This actor discovers, crawls, and analyzes publicly available information from Agincare, one of the largest home care service providers in the UK.

Wasim Safdar

Plantasjen Scraper — Scandinavian Garden & Plant Products

studio-amba/plantasjen-scraper

Studio Amba

Rituals Scraper — Luxury Body Care & Home Fragrance

studio-amba/rituals-scraper

Scrape luxury body care and home fragrance products from Rituals.com. Shower foams, candles, diffusers, gift sets, and limited editions with prices and details.

Studio Amba

Care Quality Commission Reports Scraper

alkausari_mujahid/care-quality-commission-scraper

Scrape CQC inspection reports for any UK care service — dentists, care homes, GPs, hospitals and more. Extracts full report content (HTML & PDF), ratings, registered manager, nominated individual, and complete provider details.